Search results for "cluster validation"

showing 6 items of 6 documents

A Simple Cluster Validation Index with Maximal Coverage

2017

Clustering is an unsupervised technique to detect general, distinct profiles from a given dataset. Similarly to the existence of various different clustering methods and algorithms, there exists many cluster validation methods and indices to suggest the number of clusters. The purpose of this paper is, firstly, to propose a new, simple internal cluster validation index. The index has a maximal coverage: also one cluster, i.e., lack of division of a dataset into disjoint subsets, can be detected. Secondly, the proposed index is compared to the available indices from five different packages implemented in R or Matlab to assess its utilizability. The comparison also suggests many interesting f…

ComputingMethodologies_PATTERNRECOGNITIONcluster validation
researchProduct

Comparison of cluster validation indices with missing data

2018

Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in a novel way. Experiments illustrate the di…

dataklusterianalyysicluster validationclustering
researchProduct

Toolbox for Distance Estimation and Cluster Validation on Data With Missing Values

2022

Missing data are unavoidable in the real-world application of unsupervised machine learning, and their nonoptimal processing may decrease the quality of data-driven models. Imputation is a common remedy for missing values, but directly estimating expected distances have also emerged. Because treatment of missing values is rarely considered in clustering related tasks and distance metrics have a central role both in clustering and cluster validation, we developed a new toolbox that provides a wide range of algorithms for data preprocessing, distance estimation, clustering, and cluster validation in the presence of missing values. All these are core elements in any comprehensive cluster analy…

mallintaminenGeneral Computer Sciencedistance estimation020209 energyGeneral Engineeringlaatu02 engineering and technologyTK1-9971missing valuesklusteritkoneoppiminendatavalidointialgoritmit0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingGeneral Materials ScienceMissing valuesElectrical engineering. Electronics. Nuclear engineeringcluster validationtietojenkäsittelyclusteringIEEE Access
researchProduct

DOCUMENT MANAGEMENT USING CLUSTERING ALGORITHMS

2015

Document management systems are complex systems, which offer services as storage, versioning, metadata, security, as well as indexing and retrieval capabilities. Large numbers of documents could be automatically grouped into classes of documents, which contain similar information. Therefor we propose to use clustering methods in order to group the documents. Clustering is an important process in text mining used for groping documents based on their contents in order to extract knowledge. In this paper we will present some requirements for clustering algorithms for a document management system

jel:Y80ComputingMethodologies_DOCUMENTANDTEXTPROCESSINGManagement Document Management Clustering Cluster ValidationRevista Economica
researchProduct

Knowledge discovery from physical activity

2017

Tässä pro gradu -tutkielmassa käydään läpi Knowledge Discovery in Databases (KDD) -prosessi ja sen soveltamismahdollisuuksia fyysiseen aktiivisuuteen liittyvän datan kanssa. KDD-prosessi koostuu monesta eri vaiheesta, sisältäen esikäsittelyn, datan muunnoksen ja tiedonlouhinnan. Tässä tutkielmassa tiedonlouhinnan menetelmänä käytetään klusterointia, joka käydään läpi yksityiskohtaisesti. Vertailemme myös laajan joukon eri klusterointi indeksejä (CVAIs) sekä niiden eri toteutuksia k-means klusteroinnin kanssa ja esittelemme parhaat näistä yleisemmässä muodossa. Tutkielman empiirisessä osassa seitsemäsluokkalaisten koululaisten aktiivisuusdataa tutkitaan KDD-prosessia seuraten ja hyödyntäen m…

klusteritcluster validation indexknowledge discoveryphysical activitytiedonlouhintafyysinen aktiivisuus
researchProduct

The Three Steps of Clustering in the Post-Genomic Era: A Synopsis

2011

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particul…

cluster validation indicesSettore INF/01 - InformaticaProcess (engineering)Computer sciencebusiness.industryGenomic datadistance functionMachine learningcomputer.software_genreObject (computer science)ClusteringCluster algorithmPredictive powerRelevance (information retrieval)Artificial intelligenceHigh dimensionalitybusinessCluster analysiscomputer
researchProduct